Quantum algorithm for linear systems of equations 
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I. INTRODUCTION 
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Solving linear systems of equations is a common problem that arises both on its own and as a 
subroutine in more complex problems: given a matrix A and a vector 6, find a vector x such that 
Ax = b. We consider the case where one doesn't need to know the solution x itself, but rather an 
approximation of the expectation value of some operator associated with x, e.g., x^ Mx for some 
matrix M. In this case, when A is sparse, N x N and has condition number k, classical algorithms 
can find x and estimate x^ Mx in 0{N,/k) time. Here, we exhibit a quantum algorithm for this task 
' that runs in poIy(log A'^, k) time, an exponential improvement over the best classical algorithm. 
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, Quantum computers are devices that harness quantum mechanics to perform computations in ways that classical 
' computers cannot. For certain problems, quantum algorithms supply exponential speedups over their classical coun- 
terparts, the most famous example being Shor's factoring algorithm [1]. Few such exponential speedups are known, 
and those that are (such as the use of quantum computers to simulate other quantum systems [2]) have so far found 
limited use outside the domain of quantum mechanics. This paper presents a quantum algorithm to estimate features 
of the solution of a set of linear equations. Compared to classical algorithms for the same task, our algorithm can be 
I ■ as much as exponentially faster. 

^ , Linear equations play an important role in virtually all fields of science and engineering. The sizes of the data sets 
' that define the equations are growing rapidly over time, so that terabytes and even petabytes of data may need to 
^ ] be processed to obtain a solution. In other cases, such as when discretizing partial differential equations, the linear 
equations may be implicitly defined and thus far larger than the original description of the problem. For a classical 
computer even to approximate the solution of N linear equations in N unknowns in general requires time that scales 
^f) . at least as TV. Indeed, merely to write out the solution takes time of order N. Frequently, however, one is interested 
^ ' not in the full solution to the equations, but rather in computing some function of that solution, such as determining 
, the total weight of some subset of the indices. We show that in some cases, a quantum computer can approximate the 
^ ' value of such a function in time which scales logarithmically in TV, and polynomially in the condition number (defined 
below) and desired precision. The dependence on N is exponentially better than what is achievable classically, while 
the dependence on condition number is comparable, and the dependence on error is worse. Thus our algorithm can 
achieve useful, and even exponential, speedups in a wide variety of settings where TV is large and the condition number 
is small. 

We sketch here the basic idea of our algorithm, and then discuss it in more detail in the next section. Given a 
Hermitian N x N matrix A, and a unit vector 6, suppose we would like to find x satisfying Ax = b. (We discuss 
later questions of efficiency as well as how the assumptions we have made about A and b can be relaxed.) First, the 
algorithm represents 6 as a quantum state |6) — X]i!=i I*)- Next, we use techniques of Hamiltonian simulation[3, 4] 
^ , to apply e*"^* to \b) for a superposition of different times t. This ability to exponentiate A translates, via the well- 
■ - ' known technique of phase estimation [5-7], into the ability to decompose \b) in the eigenbasis of A and to find the 
corresponding eigenvalues \j. Informally, the state of the system after this stage is close to Pj where 

Uj is the eigenvector basis of A, and \b) = X)jLi Pj We would then like to perform the linear map taking |Aj) 
to C\-^ where C is a normalizing constant. As this operation is not unitary, it has some probability of failing, 
which will enter into our discussion of the run-time below. After it succeeds, wc uncompute the register and are 

left with a state proportional to X^jli Pj^J^ Wi) = 1^) = 1^)- 

An important factor in the performance of the matrix inversion algorithm is k, the condition number of A, or the 
ratio between A's largest and smallest eigenvalues. As the condition number grows, A becomes closer to a matrix which 
cannot be inverted, and the solutions become less stable. Such a matrix is said to be "ill-conditioned." Our algorithms 
will generally assume that the singular values of A lie between l/n and 1; equivalently n~^I < A'' A < I. In this 
case, our runtime will scale as \og{N) / e, where e is the additive error achieved in the output state \x). Therefore, 
the greatest advantage our algorithm has over classical algorithms occurs when both k and 1/e are polylog(A^), in 
which case it achieves an exponential speedup. However, we will also discuss later some techniques for handling 
ill-conditioned matrices. 
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This procedure yields a quantum- mechanical representation \x) of the desired vector x. Clearly, to read out all the 
components of x would require one to perform the procedure at least N times. However, often one is interested not 
in X itself, but in some expectation value x^Mx, where M is some linear operator (our procedure also accommodates 
nonlinear operators as described below). By mapping M to a quantum- mechanical operator, and performing the 
quantum measurement corresponding to M, we obtain an estimate of the expectation value {x\ M\x) = x^Mx, as 
desired. A wide variety of features of the vector x can be extracted in this way, including normalization, weights in 
different parts of the state space, moments, etc. 

A simple example where the algorithm can be used is to see if two different stochastic processes have similar stable 
state [8]. Consider a stochastic process Xt = Axt-i + b, where the i'th coordinate in the vector xt represents the 
abundance of item i in time t. The stable state of this distribution is given by \x) = {I — A)~^ \b). Let Xf = A!x^_x-\-h\ 
and \x') = {I — A')~^ \b'). To know if \x) and \x') are similar, we perform the SWAP test between them [9]. We note 
that classically finding out if two probability distributions are similar requires at least 0{\/N) samples [10]. 

The strength of the algorithm is that it works only with 0(log iV)-qubit registers, and never has to write down all 
of A, b or X. In situations (detailed below) where the Hamiltonian simulation and our non-unitary step incur only 
poly log( ) overhead, this means our algorithm takes exponentially less time than a classical computer would need 
even to write down the output. In that sense, our algorithm is related to classical Monte Carlo algorithms, which 
achieve dramatic speedups by working with samples from a probability distribution on N objects rather than by 
writing down all N components of the distribution. However, while these classical sampling algorithms are powerful, 
we will prove that in fact any classical algorithm requires in general exponentially more time than our quantum 
algorithms to perform the same matrix inversion task. 

Outline The rest of the Letter proceeds by first describing our algorithm in detail, analyzing its run-time and com- 
paring it with the best known classical algorithms. Next, we prove (modulo some complexity-theoretic assumptions) 
hardness results for matrix inversion that imply both that our algorithm's run-time is nearly optimal, and that it runs 
exponentially faster than any classical algorithm. We conclude with a discussion of applications, generalizations and 
extensions. 

Related work Previous papers gave quantum algorithms to perform linear algebraic operations in a limited setting 
[11]. Our work was extended by [12] to solving nonlinear differential equations. 



II. ALGORITHM 



We now give a more detailed explanation of the algorithm. First, we want to transform a given Hcrmitian matrix A 
into a unitary operator e*^* which we can apply at will. This is possible (for example) if A is s-sparse and efficiently 
row computable, meaning it has at most s nonzero entries per row and given a row index these entries can be computed 
in time 0{s). Under these assumptions, Ref. [3] shows how to simulate e'"** in time 

d{\og{N)sH), 

where the O suppresses more slowly-growing terms (described in [13]). If A is not Hermitian, define 
As C is Hermitian, we can solve the equation Cy = ( ^ ) to obtain y = Applying this reduction if necessary. 



.0. 

the rest of the Letter assumes that A is Hermitian. 

We also need an efficient procedure to prepare \b). For example, if bi and J2T=ii 1^*1^ efficiently computable then 
we can use the procedure of Ref. [14] to prepare |6). Alternatively, our algorithm could be a subroutine in a larger 
quantum algorithm of which some other component is responsible for producing \b). 

The next step is to decompose \b) in the eigenvector basis, using phase estimation [5-7]. Denote by \uj) the 
eigenvectors of A (or equivalently, of e'"**), and by Xj the corresponding eigenvalues. Let 

for some large T. The coefficients of \^o) are chosen (following [5, 7]) to minimize a certain quadratic loss function 
which appears in our error analysis (see [13] for details). 
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Next we apply the conditional Hamiltonian evolution Yl'^=o ® g^Arto/T j^'o)'^ ® \h), where Iq = 0{K,/e). 

Fourier transforming the first register gives the state 



N T-1 
j=l fe=0 



(3) 



where are the Fourier basis states, and \ak\j\ is large if and only if Aj « Defining Afc := 2'Kk/to, we can relabel 
our \k) register to obtain 



N T-1 
j=l k=0 



Afc) \uj) 



Adding an ancilla qubit and rotating conditioned on A^^ yields 



N T-1 
j=l fc=0 



Afc) \uj) 



r'2 c 



where C — 0{1/ n). We now undo the phase estimation to uncompute the Xkj- If the phase estimation were perfect, 
we would have a^j = 1 if A^ = \j, and otherwise. Assuming this for now, we obtain 

AT / 



To finish the inversion we measure the last qubit. Conditioned on seeing 1, we have the state 



N ^ 



VEf=iC2|/3,f/|A,f^''^A, 

which corresponds to \x) = 5^"=i/3jAj^ \uj) up to normalization. We can determine the normalization factor from 
the probability of obtaining 1. Finally, we make a measurement M whose expectation value {x\ M \x} corresponds to 
the feature of x that we wish to evaluate. 

Run-time and error analysis We present an informal description of the sources of error; the exact error analysis and 
runtime considerations are presented in [13]. Performing the phase estimation is done by simulating e'^*. Assuming 
that A is s-sparse, this can be done with error e in time proportional to ts"^ {t / e)°''^^ =: 0{ts'^). 

The dominant source of error is phase estimation. This step errs by O(l/to) in estimating A, which translates into 
a relative error of O(l/Ato) in A~^. If A > I/k taking to = 0{K./e) induces a final error of e. Finally, we consider the 
success probability of the post-selection process. Since C = 0(l/fi;) and A < 1, this probability is at least Q.{1/k^). 
Using amplitude amplification [15], we find that 0{k) repetitions are sufficient. Putting this all together, we obtain 
the stated runtime of O (log(Af)s^K^/e). 



III. OPTIMALITY 



Classical matrix inversion algorithms To put our algorithm in context, one of the best general-purpose classical ma- 
trix inversion algorithms is the conjugate gradient method [16], which, when A is positive definite, uses 0(\/Klog(l/e)) 
matrix-vector multiplications each taking time 0{Ns) for a total runtime of 0(A'^SY^log(l/e)). (If A is not positive 
definite, 0(Klog(l/e)) multiplications are required, for a total time of 0(iVsK log(l/e)).) An important question is 
whether classical methods can be improved when only a summary statistic of the solution, such as x^ Mx, is required. 
Another question is whether our quantum algorithm could be improved, say to achieve error e in time proportional to 
poly log(l/e). We show that the answer to both questions is negative, using an argument from complexity theory. Our 
strategy is to prove that the ability to invert matrices (with the right choice of parameters) can be used to simulate 
a general quantum computation. 
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The com,plexity of matrix inversion We show that a quantum circuit using n qubits and T gates can be simulated 
by inverting an 0(l)-sparse matrix A of dimension N — 0{2"k). The condition number k is 0{T^) if we need A to 
be positive definite or 0{T) if not. This implies that a classical poly(log A^, k, l/e)-time algorithm would be able to 
simulate a poly(n)-gate quantum algorithm in poly(n) time. Such a simulation is strongly conjectured to be false, 
and is known to be impossible in the presence of oracles [17]. 

The reduction from a general quantum circuit to a matrix inversion problem also implies that our algorithm cannot 
be substantially improved (under standard assumptions). If the run-time could be made polylogarithmic in k, then any 
problem solvable on n qubits could be solved in poly(n) time (i.e. BQP=PSPACE), a highly unlikely possibility. 
Even improving our K-dependence to k^~^ for 5 > would allow any time-T quantum algorithm to be simulated 
in time o(T); iterating this would again imply that BQP=PSPACE. Similarly, improving the error dependence 
to polylog(l/e) would imply that BQP includes PP, and even minor improvements would contradict oracle lower 
bounds [18] . 

The reduction We now present the key reduction from simulating a quantum circuit to matrix inversion. Let C be 
a quantum circuit acting on n = log A'' qubits which applies T two-qubit gates Ui, . . . Ut- The initial state is |0)®" 
and the answer is determined by measuring the first qubit of the final state. 

Now adjoin an ancilla register of dimension 3T and define a unitary 

T 

U = Y, '»Ut + |i+T+l)(i+T| 7 + \t+2T+l mod 3T){t+2T\ O f/gV+i-f (4) 

t=i 

We have chosen U so that for T + 1 < t < 2T, applying [/* to |1) lip) yields \t + 1) iSi Ut ■ ■ - Ui \tp). If we now define 
A = I — Ue~^^^ then k,{A) = 0{T), and we can expand 

^-1 = ^ L/'^e-'^/^, (5) 

k>0 

This can be interpreted as applying for t a geometrically-distributed random variable. Since U^^ = I, we can 
assume 1 < t < 3T. If we measure the first register and obtain T + 1 < t < 2T (which occurs with probability 
e^^/(l + e^^ + e^^) > 1/10) then we arc left with the second register in the state Ut ■ ■ - Ui \^p), corresponding to a 
successful computation. Sampling from \x) allows us to sample from the results of the computation. This establishes 
that matrix inversion is BQP-complete, and proves our above claims about the difiiculty of improving our algorithm. 

IV. DISCUSSION 

There are a number of ways to extend our algorithm and relax the assumptions we made while presenting it. We will 
discuss first how to invert a broader class of matrices and then consider measuring other features of x and performing 
operations on A other than inversion. 

Certain non-sparse A can be simulated and therefore inverted; see [4] for techniques and examples. It is also possible 
to invert non-square matrices, using the reduction presented from the non-Hermitian case to the Hermitian one. 

The matrix inversion algorithm can also handle ill-conditioned matrices by inverting only the part of \b) which is 
in the well-conditioned part of the matrix. Formally, instead of transforming \b) = J2j l^i) \x) = ^J^Pj \uj), 
we transform it to a state which is close to 

E [well) + E 

j,Xj<l/K j,Aj>l/re 

in time proportional to for any chosen k (i.e. not necessarily the true condition number of A). The last qubit is a 
flag which enables the user to estimate what the size of the ill-conditioned part, or to handle it in any other way she 
wants. This behavior can be advantageous if we know that A is not invertible and we are interested in the projection 
of 1 6) on the well-conditioned part of A. 

Another method that is often used in classical algorithms to handle ill-conditioned matrices is to apply a 
preconditioner[19]. If we have a method of generating a preconditioner matrix B such that k{AB) is smaller than 
k(A), then we can solve Ax = b by instead solving the possibly easier matrix inversion problem {AB)(: — Bb. Further, 
if A and B are both sparse, then AB is as well. Thus, as long as a state proportional to B \b) can be efficiently 
prepared, our algorithm could potentially run much faster if a suitable preconditioner is used. 



5 

The outputs of the algorithm can also be generalized. Wc can estimate degree-2fc polynomials in the entries of x 
by generating k copies of |a;) and measuring the n/e-qubit observable 

Mi^,...,ik,ju-,jk \h,---,ik){ji,---,jk\ 

ii,...,ik,ji,---,jk 

on the state \x)^'^ . Alternatively, one can use our algorithm to generate a quantum analogue of Monte-Carlo, where 
given A and b we sample from the vector x, meaning that the value i occurs with probability 

Perhaps the most far-reaching generalization of the matrix inversion algorithm is not to invert matrices at all! 
Instead, it can compute f{A) \b) for any computable /. Depending on the degree of nonlinearity of /, nontrivial 
tradeoffs between accuracy and efficiency arise. Some variants of this idea are considered in [4, 12, 20]. 

Acknowledgements. We thank the W.M. Keck fomidation for support, and AWH thanks them as well as MIT 
for hospitality while this work was carried out. AWH was also funded by the U.K. EPSRC grant "QIP IRC." SL 
thanks R. Zecchina for encouraging him to work on this problem. We are grateful as well to R. Cleve, D. Farmer, S. 
Gharabian, J. Kelner, S. Mitter, P. Parillo, D. Spielman and M. Tegmark for helpful discussions. 

APPENDIX A: PROOF DETAILS 

In this appendix, we describe and analyze our algorithm in full detail. While the body of the paper attempted to 

convey the spirit of the procedure and left out various improvements, here we take the opposite approach and describe 
everything, albeit possibly in a less intuitive way. We also describe in more detail our reductions from non-Hermitian 
matrix inversion to Hermitian matrix inversion (Section A 4) and from a general quantum computation to matrix 
inversion (Section A 5). 

As inputs we require a procedure to produce the state a method of producing the < s non-zero elements of any 
row of A and a choice of cutoff k. Our run-time will be roughly quadratic in k and our algorithm is guaranteed to be 
correct if ||^|| < 1 and \\A-^\\ < k. 

The condition number is a crucial parameter in the algorithm. Here we present one possible method of handling 
ill-conditioned matrices. We will define the well-conditioned part of A to be the span of the eigenspaces corresponding 
to eigenvalues >1/k and the ill-conditioned part to be the rest. Our strategy will be to flag the ill-conditioned part of 
the matrix (without inverting it), and let the user choose how to further handle this. Since we cannot exactly resolve 
any eigenvalue, we can only approximately determine whether vectors are in the well- or ill-conditioned subspaces. 
Accordingly, we choose some k' > k (say k' = 2k) . Our algorithm then inverts the well-conditioned part of the matrix, 
flags any eigenvector with eigenvalue < 1/k' as ill-conditioned, and interpolates between these two behaviors when 
1/k' < |A| < This is described formally in the next section. We present this strategy not because it is necessarily 
ideal in all cases, but because it gives a concrete illustration of the key components of our algorithm. 

Finally, the algorithm produces \x) only up to some error e which is given as part of the input. We work only with 
pure states, and so define error in terms of distance between vectors, i.e. j] \a) — 1/3) II = A/2(l-Re(Q;|/3)). Since 
ancilla states are produced and then imperfectly uncomputed by the algorithm, our output state will technically have 
high fidelity not with \x) but with \x) |000 . . .). In general we do not write down ancilla qubits in the |0) state, so we 
write I a;) instead of \x) |000 . . .) for the target state, |6) instead of \b) |000 . . .) for the initial state, and so on. 

1. Detailed description of the algorithm 

To produce the input state |6), wc assume that there exists an efficiently-implcmcntable unitary B, which when 
applied to |initial) produces the state \b), possibly along with garbage in an ancilla register. We make no further 
assumption about B; it may represent another part of a larger algorithm, or a standard state-preparation procedure 
such as [14]. Let Tb be the number of gates required to implement B. We neglect the possibility that B errs in 
producing \b) since, without any other way of producing or verifying the state \b), we have no way to mitigate these 
errors. Thus, any errors in producing |6) necessarily translate directly into errors in the final state \x). 

Next, we define the state 

T=0 

for a T to be chosen later. Using [14], we can prepare |^'o) up to error in time poly log (T/e^). 



(Al) 
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One other subroutine we will need is Hamiltonian simulation. Using the reductions described in Section A 4, we 
can assume that A is Hermitian. To simuluate e*^* for some t>0,we use the algorithm of [3]. If A is s-sparse, t < to 
and we want to guarantee that the error is < e^, then this requires time 



Th = 0(log(Ar)(log*(iV))2s2to9Vi°s(«'*oA^^)) = d{log{N)sHo) (A2) 

The scaling here is better than any power of l/en, which means that the additional error introduced by this step 

introduces is negligible compared with the rest of the algorithm, and the runtime is almost linear with to- Note that 
this is the only step where we require that A be sparse; as there are some other types of Hamiltonians which can be 
simulated efficiently (e.g. [3, 4, 21]), this broadens the set of matrices we can handle. 
The key subroutine of the algorithm, denoted J/invert) is defined as follows: 

1. Prepare |^o) from |0) up to error e*. 

2. Apply the conditional Hamiltonian evolution J2^Zo ® g»Arto/T error ch- 

3. Apply the Fourier transform to the register C. Denote the resulting basis states with |A;), for fc = 0, . . . T — 1. 

Define Afe := 2nk/to. 

4. Adjoin a three-dimensional register S in the state 

s 



h{Xk)) ■■= ^Jl- f{\k?-g{\k? [nothing)^ + /(A^) |well)^ + g{Xk) |ill) 



s 



for functions /(A), g(A) defined below in (A3). Here 'nothing' indicates that the desired matrix inversion hasn't 
taken place, 'well' indicates that it has, and 'ill' means that part of |6) is in the ill-conditioned subspace of A. 

5. Reverse steps 1-3, uncomputing any garbage produced along the way. 

The functions f{X),g{X) are known as filter functions [22], and are chosen so that for some constant C > 1: 
/(A) = 1/CkX for A > 1/k, .9(A) = 1/C for A < 1/k' := 1/2k and p{X) + g'^iX) < 1 for all A. Additionally, /(A) 
should satisfy a certain continuity property that we will describe in the next section. Otherwise the functions are 
arbitrary. One possible choice is 



/(A) = 



2k ^ whenA>l/K 

5 sin (f • ) when ^ > A > ^ (A3a) 
when ^ > A 



when A > 1/k 
g{X) ={ i cos (f • when ^ > A > ^ (A3b) 

1 when ^ > A 

If ?7invert is apphcd to \uj) it will, up to an error we will discuss below, adjoin the state \h{Xj)). Instead if we apply 
f^invert to \b) (i.e. a supcrpositiou of different \uj)), measure S and obtain the outcome 'well', then we will have 
approximately applied an operator proportional to A~^. Let p (computed in the next section) denote the success 
probability of this measurement. Rather than repeating 1/p times, we will use amplitude amplification [15] to obtain 
the same results with 0{l/y/p) repetitions. To describe the procedure, we introduce two new operators: 

i2succ = /^-2|well)(wen|^, 

acting only on the S register and 

-Rinit = I — 2 1 initial) (initial I . 

Our main algorithm then follows the amplitude amplification procedure: we start with [/invert -B [initial) and repeat- 
edly apply ?7invert-Bi?init-B^t^hivert-^succ- Finally we measure S and stop when we obtain the result 'well'. The number 
of repetitions would ideally be -k j'^^fj), which in the next section we will show is 0(k). While p is initially unknown, 
the procedure has a constant probability of success if the number of repetitions is a constant fraction of 7r/4p. Thus, 
following [15] we repeat the entire procedure with a geometrically increasing number of repetitions each time: 1, 2, 
4, 8, ... , until we have reached a power of two that is > k. This yields a constant probability of success using < 4k 
repetitions. 

Putting everything together, the run-time is 0(K(TB+to.s^ log(A^)), where the O suppresses the more-slowly growing 
terms of (log*(Af))^, exp(0(l/i/log(to/eff ))) and poly log(T/e*). In the next section, we will show that to can be 
taken to be 0{K,/e) so that the total run-time is 0{kTb + t^s^ log{N)/e). 
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2. Error Analysis 

In this section we show that taking to = 0{K/e) introduces an error of < e in the final state. The main subtlety in 
analyzing the error comes from the post-sclcction step, in which wc choose only the part of the state attached to the 
|well) register. This can potentially magnify errors in the overall state. On the other hand, we may also be interested 
in the non-postselected state, which results from applying f/invert a single time to \h). For instance, this could be used 
to estimate the amormt of weight of \h) lying in the ill-conditioned components of A. Somewhat surprisingly, we show 
that the error in both cases is upper-bounded by 0{K/to). 

In this section, it will be convenient to ignore the error terms en and e*, as these can be made negligible with 
relatively little effort and it is the errors from phase estimation that will dominate. Let U denote a version of C/invert 
in which everything except the phase estimation is exact. Since \\U — !7invert|| < 0{eH + e*), it is sufficient to work 
with U. Define U to be the ideal version of i/invert in which there is no error in any step. 

Theorem 1 (Error bounds). 

1. In the case when no post-selection is performed, the error is bounded as 

\\U-U\\<0{K/to). (A4) 

2. If we post-select on the flag register being in the space spanned by {\welt) ,\ilt)} and define the normalized ideal 
state to be \x) and our actual state to be \x) then 

II \x) - \x) II < 0{K/to). (A5) 

3. If\b) is entirely within the well- conditioned subspace of A and we post-select on the flag register being \ welt) then 

II 15;) - \x) II < 0(«Ao). (A6) 

The third claim is often of the most practical interest, but the other two are useful if we want to work with the 
ill-conditioned space, or estimate its weight. 

The rest of the section is devoted to the proof of Theorem 1 . Wc first show that the third claim is a corollary of 
the second, and then prove the first two claims more or less independently. To prove (A5 assuming (A4), observe 
that if |6) is entirely in the well-conditioned space, the ideal state |a;) is proportional to A~^ \b) |well). Model the 
post-sclcction on |wcll) by a post-selection first on the space spanned by {|wcll) , |ill)}, followed by a post-selection 
onto |wcU). By (A4), the first post-selection leaves us with error 0{K/to)- This imphes that the second post-selection 
will succeed with probability > 1 — 0{k^ /to) and therefore will increase the error by at most 0{K/to). The final error 
is then O{n/to) as claimed in (A6). 

Now we turn to the proof of (A4). A crucial piece of the proof will be the following statement about the continuity 
of IMA)). 

Lemma 2. The map A ^ \h{X)) is 0{K)-Lipschitz, meaning that for any Ai ^ A2, 

II |/i(Ai)) - |/i(A2)) II = ^/2{l-Re{h{Xl)\h{\2))) < ck\\i - A2I, 

for some c = 0{l). 

Proof. Since A \h{X)) is continuous everywhere and differcntiablc everywhere except at 1/k and it suffices to 
bound the norm of the derivative of \h{X)). We consider it piece by piece. When A > 1/k, 

4r \h(X)) = , ^ Inothing) 3_ |well) , 

dX^ ^ 2k2a3^1-1/2k2A2 ' 2kA2' 

which has squared norm 2k^X'^(2k'^X^-i) ~^ Ji^x^ — ^^^^^ when 1/k' < X < 1/k, the norm of ^ \h{X)) is 

1 TT 1 TT 

2 ■ 2 ■ Tzn; = 2'^- 

K K 



Finally ^ \h{X)) = when A < 1/k'. This completes the proof, with c = ^. 



□ 
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Now we return to the proof of (A4). Let P denote the first three steps of the algorithm. They can be thought of 
as mapping the initial zero qubits to a |fc) register, together with some garbage, as follows: 

n 

^ = X] l'^) |garbage(j, A;)) (initial] , 

3=1 k 

where the guarantee that the phase estimation algorithm gives us is that a^|j is concentrated around Xj « 2Trk/to =■ 

Afc. Technically, P should be completed to make it a unitary operator by defining some arbitrary behavior on inputs 
other than | initial) in the last register. 

Consider a test state |6) = X^^i /3j \uj). The ideal functionality is defined by 

N 

|^) = C/|6) = ^/?,K)|MA,)), 

i=i 

while the actual algorithm produces the state 

N 

1^) = u\b) = p^Y. Pi 1"^) E "fcb- 1^) K^'^)) ' 

Wc wish to calculate {(p\ip)-, or equivalently the inner product between P |<^) and P \ip) = J2j,k i^j^k\j \k) \h{Xj)). 
This inner product is 

N 

im = E l^^-l' E M {hC^k)\h{X,)) := E,E, (^hCXk)\h{X,)) , 
j=i k 

where we think of j and k as random variables with joint distribution Pr(j, k) = \0j\'^\ak\j\'^ ■ Thus 

Re{<^\ifi) =EjEfcRe(/i(Afe)|/i(Aj)^ 



Let S = Xjto — 2'Kk = to{Xj — Afe). Prom Lemma 2, Re (^h(Xk)\h{Xj)J > 1 — c^k^5'^ /2tQ, where c < | is a constant. 

There arc two sources of infidelity. For 5 < 2tt, the inner product is at least 1 — 2'k'^c^ /t^. For larger values of 5, 
we use the bound |Q:fe|jP ^ 647r^/(Ajto — 2nk)'^ (proved in Section A3) to find an infidelity contribution that is 



fe=i 



Summarizing, we find that Re{ip\(p) > 1 — 5tt^c^k^ /tg, which translates into || \<f) — \(p) || < AncK/to = 2'n^K/to. 
Since the initial state \b) was arbitrary, this bounds the operator distance ||C/ — ?7|| as claimed in (A4). 
Turning now to the post-selected case, we observe that 

I . f{A)\h) |well)+g(A)|&)|ill) 
^{b\{f{AY+g{AY)\h) 



Ej/3.-|%-)(/(Aj-)|well)+ff(A,)|ill)) 



(AS) 



7E,I/3.P(/(A,)^+.9(A 

(A9) 



E,/3.l«.X/(A,)|well)+5(A,)|ill)) 



Where in the last step we have defined 



p:=E,[/(A,)2+3(A,)2] 



to be the probability that the post-selection succeeds. Naively, this post-selection could magnify the errors by as 
much as 1/ y/p, but by careful examination of the errors, we find that this worst-case situation only occurs when the 
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errors are small in the first place. This is what will allow us to obtain the same 0{K/to) error bound even in the 
post-selected state. 

Now write the actual state that we produce as 

_ M Efc \k) ifiXk) Iwell) + g(Afc) |ill)) 

\x) .— = lAiUj 

^E,-,/(Afe)2+g(Afc)2 

Pj \uj) Efc c^fcb |fc) (/(Afe) Iwell) + ff(Afe) |ill)) 
— • , \^^^) 

where we have defined p = Ej^fc[/(Afc)^ + g{Xk)^]. 

Recall that j and k are random variables with joint distribution Pr(j, k) — \/3j\'^\ak\j\^ ■ We evaluate the contribution 
of a single j value. Define A := Xj and A := 2Trk/to- Note that S = to(A — A) and that E5, = 0(1). Here S depends 
implicitly on both j and k, and the above bounds on its expectations hold even when conditioning on an arbitrary 
value of j. We further abbreviate / := /(A), / := /(A), g := g{X) and g = g{X). Thus p := E[/^ + g'^] and 
p = E[p+g% 

Our goal is to bound || |a;) — \x) \\ in (A5). We work instead with the fidelity 

F := (ib) = ^^ff + 3~9] = nf+9^]+n{f-f)f+{~9-9)9] .^^2) 



p 



2 p 



Next we expand 



p-p = E[p-f]+E[g^-g^] (A14) 
= E[(/ - /)(/ + /)] + E[(g - g)Cg + g)] (A15) 
= 2E[(/ - /)/] + 2E[{g - g)g] + E[(/ - ff] + E[{g - gf] (A16) 

Substituting into (A13), we find 

E[(/-/)^+(fl-fln E[(/-/)/+(ff-ff)ff] 

~ 2p p 2p ^ 

We now need an analogue of the Lipschitz condition given in Lemma 2. 
Lemma 3. Let f, f, g, g he defined as above, with k' = 2k. Then 



where c = tt /2. 



K 


- ^2 



\f-ff + \9-~9f<c^SV' + g'\ 



Proof. Remember that f = f{X — 6/to) and similarly for g. 

Consider the case first when A > 1/k. In this case 5 = 0, and we need to show that 

l/-/l<2^ = ^ (A18) 

To prove this, we consider four cases. First, suppose A > 1/k. Then \f — f\ = ^ ^^Xx — I'^l/^^oA. Next, suppose 
A = 1/k (so / = 1/2) and A < 1/k. Since sin |a > a for < a < 1, we have 
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and using A = 1/k wc find that |/ — /| — ^^j^, as desired. Next, ifA< 1/k<A and / < / then replacing A with 1/k 
only makes the inequality tighter. Finally, suppose A < 1/k < A and f < f- Using (A19) and A > 1/k we find that 
/ - / < 1 - < 1 - A/A = (A - A)/A, as desired. 
Now, suppose that X <1/k. Then 



|/-/P<^max|/'p = ^^K^ 



And similarly 



< 72 max|5T = -TTs^'- 

In 41 In 



.2 ^2 



Finally /(A)^ + g{X)'^ = 1/2 for any A < 1/k, implying the result. 

Now we use Lemma 3 to bound the two error contributions in (A13). First bound 



2p 



< o 



ti 



k2\ E[(/2+ 52)521 



E[/2 + g2] 



< o 



□ 



(A20) 



The first inequality used Lemma 3 and the second used the fact that £[^2] < 0(1) even when conditioned on an 
arbitrary value of j (or equivalently 
Next, 



m-f)f +{9-9)9] 



E 



< 



^{{J-fY + ig-g?) + e[^^(/2 + 32)2 



< o 



to 



(A21) 



where the first inequality is Cauchy-Schwartz, the second is Lemma 3 and the last uses the fact that E[|5|] < ^E[(52] = 
0(1) even when conditioned on j. 
We now substitute (A20) and (A21) into (A16) (and assume k < to) to find 



to 



(A22) 



Substituting (A20), (A21) and (A22) into (A17), we find Re {x\x) > 1 - 0{K^/tl), or equivalently, that || \x) - \x) \\ < e. 
This completes the proof of Theorem 1. □ 



3. Phase estimation calculations 

Here we describe, in our notation, the improved phase-estimation procedure of [5, 7], and prove the concentration 
bounds on \ak\j\. Adjoin the state 

r=0 

Apply the conditional Hamiltonian evolution \t){t\ (8) e'"*'^*"/'^. Assume the target state is \uj), so this becomes 
simply the conditional phase J^t \T){T\e^^^*°'^/'^ . The resulting state is 

l*A,to) = V^^I^e " sin \T)\uj). 

T=0 
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-W/2T \ 

. I (A27) 

(A28) 



We now measure in the Fourier basis, and find that the inner product with ■^^2^=0^^"''^ k) l^j) (defining 
S := Xjto — 2'Kk): 

= ^ E e^*^'^^o-2.,) (A23) 

T = 

= -7f^Ee"(e " -e - j (A24) 
= — 6^6*"^^^ — e~^e''^^ fA25) 

= e"5T _ (A26) 

_ i + e'^ / e-'^/^^ e-'^/^^ 

i\/2T \e~2T('5+'^) — e2T('5+'r) e~2T(*— ^) — e2T(< 

_ (1 + e'^)e-'V^^ / 1 1 \ 

i\/2T i^-2isin(^) -2«sin(^) J ' ' 

T \sm{^) sin(^)y 

_ .|a_i, V^cos(f) sin (^)- sin (^) 

T • sin (^) sin (^) ^ 
_ i|(i_^)y2cos(f) 2cos(^) sin(^) 
T sin(^)sm(^) 

Following [5, 7], we make the assumption that 2t^ < 5 < T/10. Further using a — /Q < sin a < a and ignoring 
phases we find that 

47r\/2 Stt 

Thus |afe|j p < 647rV<5^ whenever \k - Ajto/27r| > 1. 

4. The non-Hermitian case 

Suppose A e C*^^-^ with M < N. Generically Ax = b is now underconstrained. Let the singular value decompo- 
sition of A be 

M 

i=i 

with e C^, 1?;^) e and Ai > • • • Am > 0. Let V = span{|i;i) , . . . , \vm)}- Define 

is Hermitian with eigenvalues ±Ai, . . . ,±Am, corresponding to eigenvectors \wf) := :^(|0) \uj) ± |1) \vj}). It also 
has N — M zero eigenvalues, corresponding to the orthogonal complement of V. 
To run our algorithm we use the input |0) \b). If \b) = '^jLi Pj 1%) then 

M 

|0)|6) = 5^,.'?,-^(|«;+) + |«;7)) 
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and running the inversion algorithm yields a state proportional to 

M M 

Dropping the inital |1), this defines our solution |a;). Note that our algorithm does not produce any component in 
V-^, although doing so would have also yielded valid solutions. In this sense, it could be said to be finding the \x) 
that minimizes {x\x) while solving A\x) = \b). 

On the other hand, if M > N then the problem is overconstrained. Let U = span{|Mi) , . . . , |uAr)}. The equation 
A\x) = \b) is satisfiablc only if \b) S U. In this case, applying H to |0) \b) will return a valid solution. But if \b) 
has some weight in JJ-^, then |0) \b) will have some weight in the zero eigenspace of H, which will be flagged as 
ill-conditioned by our algorithm. We might choose to ignore this part, in which case the algorithm will return an |a;) 
satisfying ^ I a;) = J2jLi \uj){uj\ \b). 



5. Optimality 

In this section, we explain in detail two important ways in which our algorithm is optimal up to polynomial factors. 
First, no classical algorithm can perform the same matrix inversion task; and second, our dependence on condition 
number and accuracy cannot be substantially improved. 

We present two versions of our lower bounds; one based on complexity theory, and one based on oracles. We say 
that an algorithm solves matrix inversion if its input and output are 

1. Input: An 0(l)-sparsc matrix A specified either via an oracle or via a poly(log(A'^))-time algorithm that returns 

the nonzero elements in a row. 

2. Output: A bit that equals one with probability {x\ M |a;) ±e, where M = \0){0\(^ 1^/2 corresponds to measuring 
the first qubit and \x) is a normahzed state proportional to A~^ \b) for \b) = |0). 

Further we demand that A is Hermitian and k~^I < A< I. We take e to be a fixed constant, such as 1/100, and later 
deal with the dependency in e. If the algorithm works when A is specified by an oracle, we say that it is relativizing. 
Even though this is a very weak definition of inverting matrices, this task is still hard for classical computers. 

Theorem 4. 1. If a quantum algorithm exists for matrix inversion running in time k^~^ ■ polylog(7V) for some 
5>0, then BQP=PSPACE. 

2. No relativizing quantum algorithm can run in time k^^^ ■ poly log(A'^). 

3. If a classical algorithm, exists for matrix inversion running in time poly(K, log(iV)), then BPP=BQP. 
Given an n-qubit T-gate quantum computation, define U as in (4). Define 



I-Ue-T 



(A34) 



Note that A is Hermitian, has condition number k < 2T and dimension N = 6T2". Solving the matrix inversion 
problem corresponding to A produces an e-approximation of the quantum computation corresponding to applying 
Ui, . . . , Ut, assuming we are allowed to make any two outcome measurement on the output state |a;). Recall that 

(/-[/e-*)"' =^C/'=e-'=/^. (A35) 

fe>0 

We define a measurement Mq, which outputs zero if the time register t is between T + 1 and 2T, and the original 
measurement's output was one. As Pr(T + 1 < fc < 2T) = e^^/(l + + e^^) and is independent of the result of the 
measurement M, we can estimate the expectation of M with accuracy e by iterating this procedure O (l/e'^) times. 
In order to perform the simulation when measuring only the first qubit, define 
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We now define B to be the matrix B, after we permuted the rows and columns such that if 

^ = (1. o)' 

and Cy = , then measuring the first qubit of \y) would correspond to perform Mq on |a;). The condition number 

of C is equal to that of A, but the dimension is now N = 18T2". 

Now suppose we could solve matrix inversion in time K^~''(log(A'')/e)°^ for constants Ci > 2,(5 > 0. Given a 
computation with T < 22"-/18, let m = f iog(fog(n)) ^^'^ ^ = 1/lOOm. For sufficiently large n,e> l/log(n). Then 

^r-. (^^y < (2T)-^ {^^y < r-^c.(nlog(n))-, 

where another constant. 

We now have a recipe for simulating an Ui-qubit Tj-gate computation with Ui+i = rij + log(18Tj) qubits, Tj+i = 
T^~^cs{ni log{ni)y^ gates and error e. Our strategy is to start with an no-qubit To-gate computation and iterate this 
simulation £ < m, times, ending with an n^-qubit T^-gate computation with error < me < 1/100. We stop iterating 

either after m steps, or whenever Tj+i > tI~^^^, whichever comes first. In the latter case, we set £ equal to the first 
i for which Ti+i > T^'^^'^. 

In the case where we iterated the reduction m times, we have Tj < T^^"''/^)' < 2^^~^/'^^"^^° , implying that Tm < no- 

On the other hand, suppose we stop for some £ < m. For each i < £ wc have T,;+i < Thus Ti < 2(i^''/2)'2no 

for each i < £. This allows us to bound = hq 

(I + l) no + TOlog(18). Defining yet another constant, this imphes that Tj_|_i < T.^^'^ 03(710 log(no))'^^. Combining this 
with our stopping condition T^+i > T^~^^'^ we find that 

Te < {C3{no\og{no)ry = poly(no). 

Therefore, the runtime of the procedure is polynomial in no regardless of the reason we stopped iterating the procedure. 
The number of qubits used increases only linearly. 

Recall that the TQBF (totally quantified Boolean formula satisfiability) problem is PSPACE-complcte, meaning 
that any fc-bit problem instance for any language in PSPACE can be reduced to a TQBF problem of length n = 
poly(fc) (see [23] for more information). The formula can be solved in time T < 2^"'/18, by exhaustive enumeration 
over the variables. Thus a PSPACE computation can be solved in quantum polynomial time. This proves the first 
part of the theorem. 

To incorporate oracles, note that our construction of U in (4) could simply replace some of the Ui's with oracle 
queries. This preserves sparsity, although we need the rows of A to now be specified by oracle queries. We can now 
iterate the speedup in exactly the same manner. However, we conclude with the ability to solve the OR problem on 
2" inputs in poly(n) time and queries. This, of course, is impossible [24], and so the purported relativizing quantum 
algorithm must also be impossible. 

The proof of part 3 of Theorem 4 simply formulates a poly(n)-time, n-qubit quantum computation as a k = poly(n), 
N = 2"' ■ poly(n) matrix inversion problem and applies the classical algorithm which we have assumed exists. □ 

Theorem 4 established the universality of the matrix inversion algorithm. To extend the simulation to problems 
which are not decision problems, note that the algorithm actually supplies us with \x) (up to some accuracy). For 
example, instead of measuring an observable M, wc can measure |a;) in the computational basis, obtaining the result i 
with probability | {i\x) |^. This gives a way to simulate quantum computation by classical matrix inversion algorithms. 
In turn, this can be used to prove lower bounds on classical matrix inversion algorithms, where we assume that the 
classical algorithms output samples according to this distribution. 

Theorem 5. No relativizing classical matrix inversion algorithm can run in time N°'2^"' unless 3a + 4/3 > 1/2. 

If we consider matrix inversion algorithms that work only on positive definite matrices, then the N°'2^'^ bound 

becomes N'^2'^^. 

Proof. Recall Simon's problem [17], in which we are given / : — » {0, 1}^" such that f{x) = f{y) iS x + y = a for 
some a G Z2 that we would like to find. It can be solved by running a 3n-qubit 2n + 1-gate quantum computation 
0{n) times and performing a poly(n) classical computation. The randomized classical lower bound is 0(2"/^) from 
birthday arguments. 
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Converting Simon's algorithm to a matrix A yields k w 4n and w 36n2"^". The run-time is N"2^'^ Ri 2(3"+4'')"- • 
poly(r7,). To avoid violating the oracle lower bound, wc must have 3q: + 4/3 > 1/2, as required. □ 

Next, we argue that the accuracy of algorithm cannot be substantially improved. Returning now to the 
problem of estimating {x\M\x), we recall that classical algorithms can approximate this to accuracy e in time 
0(A''Kpoly(log(l/e))). This poly (log (l/e)) dependence is because when writing the vectors \b) and |a;) as bit strings 
means that adding an additional bit will double the accuracy. However, sampling-based algorithms such as ours 
cannot hope for a better than poly(l/e) dependence of the run-time on the error. Thus proving that our algorithm's 
error performance cannot be improved will require a slight redefinition of the problem. 

Define the matrix inversion estimation problem as follows. Given A,b,M,e,K,s with \\A\\ < 1, < k, A s- 

sparse and efficiently row-computable, \b) = |0) and M = |0)(0| ® In/2- output a number that is within e of (a;| M \x) 
with probability > 2/3, where \x) is the unit vector proportional to A~^ \b). 

The algorithm presented in our paper can be used to solve this problem with a small amount of overhead. By 
producing \x) up to trace distance e/2 in time 0(log{N)K^H^ /e), we can obtain a sample of a bit which equals one 
with probability /x with |/z — M | < e/2. Since the variance of this bit is < 1/4, taking l/3e^ samples gives us a 
> 2/3 probability of obtaining an estimate within e/2 of /x. Thus quantum computers can solve the matrix inversion 
estimation problem in time 0{\og{N)K^s'^ /e'^). 

We can now show that the error dependence of our algorithm cannot be substantially improved. 

Theorem 6. 1. If a quantum algorithm exists for the matrix inversion estimation problem running in time 
poly(K,log(iV), log(l/e)) then BQP=PP. 

2. No relativizing quantum algorithm for the matrix inversion estimation problem can run in time N" po\y{K) / 
unless a + P > 1 . 

Proof. 1. A complete problem for the class PP is to count the number of satisfying assignments to a SAT formula. 
Given such formula (j), a quantum circuit can apply it on a superposition of all 2" assignments for variables, 
generating the state 

^ \zi,...,Zn}\<l){Zi,...Zn)) ■ 

zi,...,z„e{o,i} 

The probability of obtaining 1 when measuring the last qubit is equal to the number of satisfying truth assign- 
ments divided by 2". A matrix inversion estimation procedure which runs in time polylog(l/e) would enable 
us to estimate this probability to accuracy 2~^" in time poly(log(2^")) = poly(n). This would imply that BQP 
= PP as required. 

2. Now assume that 4>{z) is provided by the output of an oracle. Let C denote the number of z G {0, 1}" such that 
= 1. From [18], we know that determining the parity of C requires f7(2") queries to (j). However, exactly 
determining C reduces to the matrix inversion estimation problem with N = 2", k = 0{n'^) and e = 2~"~^. By 
assumption we can solve this in time 2^"+''^" • poly(n), implying that a + /? > 1. 

□ 
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